Combining Multiple Clusterings Using Evidence Accumulation

نویسندگان

  • Ana L. N. Fred
  • Anil K. Jain
چکیده

We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble--a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n x n, similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the K-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Clustering Using Evidence Accumulation

We explore the idea of evidence accumulation for combining the results of multiple clusterings. Initially, n d−dimensional data is decomposed into a large number of compact clusters; the K-means algorithm performs this decomposition, with several clusterings obtained by N random initializations of the K-means. Taking the cooccurrences of pairs of patterns in the same cluster as votes for their ...

متن کامل

Combining Data Clusterings with Instance Level Constraints

Recent work has focused the incorporation of a priori knowledge into the data clustering process, in the form of pairwise constraints, aiming to improve clustering quality and find appropriate clustering solutions to specific tasks or interests. In this work, we integrate must-link and cannot-link constraints into the cluster ensemble framework. Two algorithms for combining multiple data partit...

متن کامل

Evidence Accumulation Clustering Based on the K-Means Algorithm

The idea of evidence accumulation for the combination of multiple clusterings was recently proposed [7]. Taking the K-means as the basic algorithm for the decomposition of data into a large number, k, of compact clusters, evidence on pattern association is accumulated, by a voting mechanism, over multiple clusterings obtained by random initializations of the K-means algorithm. This produces a m...

متن کامل

Combining multiple clusterings using similarity graph

Multiple clusterings are produced for various needs and reasons in both distributed and local environments. Combining multiple clusterings into a final clustering which has better overall quality has gained importance recently. It is also expected that the final clustering is novel, robust, and scalable. In order to solve this challenging problem we introduce a new graph-based method. Our metho...

متن کامل

Computation of Initial Modes for K-modes Clustering Algorithm Using Evidence Accumulation

Clustering accuracy of partitional clustering algorithm for categorical data depends primarily on the choice of initial data points to instigate the clustering process and hence the clustering results cannot be generated and repeated consistently. In this paper we present an approach to compute initial modes for K-mode partitional clustering algorithm to cluster categorical data sets. Here we u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE transactions on pattern analysis and machine intelligence

دوره 27 6  شماره 

صفحات  -

تاریخ انتشار 2005